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Abstract 



Bundle adjustment (BA) is the problem of refining a vi- 
sual reconstruction to produce better structure and viewing 
parameter estimates. This problem is often formulated as 
a nonlinear least squares problem, where data arises from 
interest point matching. Mismatched interest points cause 
serious problems in this approach, as a single mismatch 
will affect the entire reconstruction. In this paper, we pro- 
pose a novel robust Student's t BA algorithm (RST-BA). We 
model reprojection errors using the heavy tailed Student's 
t- distribution, and use an implicit trust region method to 
compute the maximum a posteriori (MAP) estimate of the 
camera and viewing parameters in this model. The result- 
ing algorithm exploits the sparse structure essential for re- 
constructing multi-image scenarios, has the same time com- 
plexity as standard L2 bundle adjustment (L2-BA), and can 
be implemented with minimal changes to the standard least 
squares framework. We show that the RST-BA is more ac- 
curate than either L2-BA or L2-BA with a a -edit rule for 
outlier removal for a range of simulated error generation 
scenarios. The new method has also been used to recon- 
struct lunar topography using data from the NASA Apollo 
15 orbiter, and we present visual and quantitative compar- 
isons of RST-BA and L2-BA methods on this application. In 
particular, using the RST-BA algorithm we were able to re- 
construct a DEM from unprocessed data with many outliers 
and no ground control points, which was not possible with 
the L2-BA method. 



1. Introduction 

Bundle adjustment is a large sparse geometric parame- 
ter estimation problem, in which parameters are 3D feature 
coordinates and camera poses. Classically, bundle adjust- 
ment is formulated in the nonlinear least squares framework 
(see [28] and sources cited within). The goal of bundle 
adjustment is to refine a visual reconstruction by identify- 
ing a sparse cloud of tie-point features in multiple images, 
matching tie-points common to several images, and adjust- 
ing camera and world point parameters simultaneously to 
improve the reconstruction. Prior information about cam- 
era parameters and ground control points are incorporated 
using the Bayesian modeling framework, and camera pa- 
rameters and 3D world coordinates of the tie-point features 
are estimated using the matched tie-point features as data. 

The motivating problem for our work in robust bundle 
adjustment is to produce a robust reconstruction in the pres- 
ence of outliers generated due to the misidentification of 
tie-points across images. It is virtually impossible to en- 
sure that automated algorithms for finding and matching tie- 
point features always generate correct correspondences. A 
single wrong identification can lead to a large phantom er- 
ror which dominates other 'good' data in the least squares 
framework. 

To derive our approach, we begin with a statistical model 
for the reprojection errors in pixel space as well as ini- 
tial camera parameter errors and ground control point er- 
rors, and then find a maximum a posteriori estimate for this 
model using an implicit trust region optimization method. 
We model both reprojection errors and prior uncertainty on 
cameras and ground control points as distributed according 
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to the Student's t-distribution. 

There are many approaches that try to incorporate the 
idea of 'robust cost' or 'fat tails' into the bundle adjust- 
ment framework (see [28], [11]). Most of these schemes 
require some iterative re- weighting of the least squares ob- 
jective. However, these approaches lack a progress metric 
or any optimality guarantees, which limits possibilities for 
algorithm design and convergence criteria. We begin in- 
stead with distributional assumptions on reprojection errors, 
derive a corresponding maximum a posteriori (MAP) opti- 
mization problem, and develop an algorithm to solve it. Our 
approach takes advantage of the sparse structure described 
in [16] and [12], and determines the best camera parame- 
ters by solving a maximum likelihood (ML) problem, using 
an algorithm that still follows the main idea in [ ] while 
exploiting the sparse structure in [11]. 

The resulting robust Student's t BA (RST-BA) preserves 
the sparse structure and efficiency of the method used 
in [16], allowing the robust implementation to run in a com- 
parable time to the standard BA method. Compared with 
automated outlier removal schemes and ad-hoc reweighting 
using 'robust-cost' functions, RST-BA is faster and more 
accurate both on synthetic data and in real applications. 
Moreover, it is straightforward to implement with minimal 
changes to the standard least- squares framework. 

Mismatches in tie-point features is a well known prob- 
lem, and there are a variety of approaches in the literature 
for robustification and dealing with outliers. Threshold- 
based outlier removal is a common approach with many 
variants ([27], [22], [20], [21], [4], [9], [15], [13]). After 
one or more iterations of bundle adjustment, observations 
with residuals exceeding some predetermined threshold are 
removed from the dataset and the bundle adjustment opti- 
mization is run again on the remaining observations. The 
choice of threshold depends on the particular problem and 
dataset. [27], for instance, uses a threshold of 1.5 stan- 
dard deviations from the mean residual error, while [21] 
discards residuals with size to variance ratio greater than 



3. Generally, some a-edit rule is applied, with a typical 
range between 1 to 2 standard deviations. However, there 
are problems with using a-edit rules even in the linear re- 
gression context [ , Chapter 1], and it is particularly limit- 
ing in nonlinear problems because outliers affect the initial 
fit, which is then used for classification of outliers and in- 
liers. 

Another common approach is iterative reweighting ([8], 
[7], [20], [23]). In this method, the observations are 
weighted at each iteration in inverse proportion to their 
residual errors. Observations may also be removed accord- 
ing to a a-edit rule with a higher threshold [ ]. Closely 
related to this approach is that of 'robust cost functions', 
which is discussed in [11] and implemented in [3] and [ ]. 
The main problems with ad hoc reweighting according to 
such cost functions is the lack of a convergence theory that 
guarantees the algorithm will stop at a stationary point of 
the objective, and in many cases the lack of an explicit ob- 
jective function. 

Model-based bundle adjustment is yet another robust 
technique, in which prior information about the surface un- 
der reconstruction is used to aid in the adjustment [7]. [29] 
proposes a bundle adjustment method which does not in- 
volve solving for the camera parameters, in order to reduce 
numerical instability and decrease sensitivity to noise. We 
do not consider these approaches, since we have no prior 
information on the surface we are reconstructing, and re- 
finement of the camera parameters is essential to our appli- 
cation. 

The paper proceeds as follows. In Section 2 we formu- 
late standard bundle adjustment as a nonlinear least squares 
problem that can be solved with L2-BA. In Section 3 we de- 
scribe the multivariate Student's t-distribution. In Section 4, 
we formulate robust bundle adjustment as the maximum a 
posteriori likelihood problem and develop the RST-BA al- 
gorithm to solve it. In Section 5, we use simulated data to 
compare RST-BA with the standard L2-BA algorithm, and 
with L2-BA combined with a 2-sigma edit rule for remov- 



ing outliers. In Section 6, we show the results of applying 
the RST-BA algorithm to the problem of reconstructing lu- 
nar topography from NASA Apollo 15 orbital imagery, and 
compare the results with results obtained from L 2 -BA for 
both processed and unprocessed data. 

2. Mathematical Model for BA 

Suppose that we have m images taken by a moving cam- 
era, or equivalently by m cameras with different poses (lo- 
cations and attitudes). Suppose that multiple tie-point fea- 
tures are identified in each image, and that a single feature 
may appear in several images. Denote by S the control net- 
work of indices describing the image tie-point matching, so 
that (z, j) G S if feature i was seen in image j. Let Xj de- 
note the pose of the jth camera, and let yi denote the 3D 
world coordiantes of feature i. Let zij be the pixel coor- 
dinates of the projection h(xj,yi) of feature i onto image 
j, 6ij be the reprojection error z^j — h{xj,yi), and be 
the covariance matrix associated to . Define x® and y® to 
be prior estimates of camera parameters and ground control 
points, with covariances Qj and $>i, respectively. All the 
data except for {xj} and {^} are known and given. The 
standard statistical model for the BA problem is as follows: 



Zij — h(xj,y<i)-\- 
€ij ~ (0,£ij) 
Xj ~ 



(1) 



The standard L 2 approach to bundle adjustment is to assume 
that €ij , Xj , and yi are all normally distributed, and then the 
maximum a posteriori likelihood solution is equivalent to 
minimizing the objective 



w.r.t. 
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The prior terms for camera parameters and ground con- 
trol points are added to deal with gauge freedom. Ground 
control points can be used when 3D coordinates of certain 
tie-point features are well known, which is the case for Lu- 
nar topography data. Note that if there is no prior informa- 
tion on a particular Xj or y i9 we simply set the correspond- 
ing QJ 1 or ^r 1 toOin (2). 

The standard approach to bundle adjustment is to min- 
imize the objective (2) using implicit trust region meth- 
ods, and in particular variants of the Levenberg-Marquardt 



method are very popular (see [18], [11], [26], [25], [6] for 
more details on these methods). For our implementation 
of L2-BA we use a particular variant of the Levenberg- 
Marquard detailed in [ , Algorithm 3.16], which is also 
used in the SB A implementation [ ]. The method of 
choosing a cloud of points that 'links' the images together 
gives rise to a sparse structure, and we exploit this structure 
as described in [ , Algorithm A6.4]. 

3. Student's t Approach 

We introduce the following notation: for a vector u G 
R n and any positive definite matrix M G R nxn , let 
\\u\\m •= Vu T Mu. We use the following generalization 
of the Student's t-distribution: 
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(3) 



where /i is the mean parameter, s is the degrees of freedom, 
m is the dimesion of the vector e, R is a positive definite 
matrix, and y/R or R 1 / 2 denotes a Choleski factor; i.e., 
y/RVR T = R 1 / 2 R T / 2 = R. A comparison of this dis- 
tribution with the Gaussian distribution assumed in (2) and 
the Laplace distribution appears in Figure 1 . Note that the 
Student's t-distribution has much thicker tails than the oth- 
ers, and that its influence function is redescending (see [19] 
for a discussion of influence functions). 

The main idea of the RST-BA algorithm is to assume 
that reprojection errors come from the Student's t- 
distribution (3). We also assume the Student's t-distribution 
prior for the initial camera parameters {x®} and ground 
control points {y®}. The intuition behind this approach is 
that extreme observations are much more likely in the Stu- 
dent's t model than in the Gaussian model. Therefore, a 
large residual will affect the overall fit less if fitting is done 
in model (3). See [ ] for more details. 

4. Maximum Likelihood Formulation 

Maximizing the likelihood for our model (1) is equiva- 
lent to minimizing the associated negative log likelihood 

-logp({e ij }) -\ogp({x°j - Xj}) -logp({^° -yi}) 

Dropping the terms that do not depend on {xj} or {yi} our 
objective is 
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Table 1. Relative mean a and standard deviation a of MSE calculated over 1000 runs for L2-BA, L2-BA with the 2cr-edit rule (2cr-BA), 
and RST-BA, presented as: \i (a). Error values for world points and camera XYZ parameters respectively are presented relative to the error 
incurred by L 2 -BA in the nominal case, shown in bold, i.e. where reprojection errors added were distributed as iV(0, 1). 



Noise Type 


World Points 


Camera XYZ 




L 2 -BA 


2<j-BA 


RST-BA 


L 2 -BA 


2<j-BA 


RST-BA 


7V(0,1) 


1.0 (1.3) 


1.0(1.2) 


1.0(1.3) 


1.0 (0.9) 


0.8 (0.7) 


0.7 (1.4) 


.95A^(0,1) + .05A^(0,4) 


1.3 (1.6) 


1.2(1.5) 


1.1 (1.4) 


6.3 (7.6) 


2.7 (4.1) 


3.5(13.3) 


,9JV(0,1) + .1JV(0,4) 


1.5(1.7) 


1.5(1.9) 


1.4(1.7) 


11.5(12.5) 


5.6 (7.6) 


5.9(18.1) 


.95A/X0, 1) + .05A^(0, 10) 


2.7 (3.4) 


1.8(2.0) 


1.2(1.4) 


69 (62) 


23 (27) 


7.3 (23) 


.9JV(0,1) + .1JV(0,10) 


3.6 (4.6) 


2.7 (3.0) 


1.4(1.6) 


101 (76) 


49 (42) 


16.5 (34) 


.95A^(0,1) + .05A^(0,50) 


39 (45) 


21 (30) 


1.9(1.7) 


580 (380) 


306 (242) 


12 (23) 


.9^(0,1) + .1JV(0, 50) 


60 (63) 


44 (47) 


2.5 (2.1) 


740 (510) 


470 (300) 


20 (36) 


Student's t, df = 4 


12.3 (13.5) 


12.2(15.1) 


8.9 (10.2) 


240 (150) 


190(130) 


38 (60) 



where sij , r j , and qi are known degrees of freedom param- 
eters in model (3) associated to observations Zij, prior cam- 
era parameters x®, and ground control points y^, respec- 
tively. The constants 2, 6 and 3 that appear in (4) are the di- 
mensions of the pixel coordinates, camera poses, and world 
points, respectively. 

Minimizing objective (4) provides maximum a posteri- 
ori (MAP) likelihood estimates for parameter vectors {xj} 
and {i/i} in the Student's t model (3). 

Now we describe an implicit trust region algorithm for 
minimizing (4). Given a sequence of column vectors {u k } 
and matrices {X^} we use the notation 



vec({u k }) 



U N 
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We define 

c = vec({xj},{yi}). 

We will now refer to objective (4) as F(c). In order to min- 
imize (4), we implement an iterative method of the form 

c = c k - (iJ* 5 ) -1 VF(c) 



c if F(c)<F(c k ) 



(5) 



otherwise. 



where k indexes the iterations, and H k is a particular pos- 
itive definite matrix described below, which one may think 
of as a Hessian approximation to V 2 F(c k ). 

Let J k = [A k B k ], where A k - = d Xj h(x k ,y k ) and 
B k j = dy t h(x k , y k ). Define weights 



Pij 
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Q k 
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(6) 



Let A k - = p k -A k . and B k . = p k .B k .. Let A k and B k be 
matrices with block components A k j and B k - , respectively, 
and define J k = [A k B k ] . Define 



= (j k ) T S- 1 (J k ) + diag {{Qp 3 }) (?) 
+ diag({^^})+A fc 7 



where \ k is a regularization parameter similar to the 
Levenberg-Marquardt method, and is updated according to 
the rule defined in Algorithm 3.16 of [ ]. Specifically, A 
increases quickly when the iteration (5) fails to improve the 
objective function (4), and otherwise is adjusted according 
to the rule 



v 3' 



(2^ k - If) 



(8) 



where (f) k is the ratio of improvement predicted by the 
quadratic model with Hessian H k to the actual improve- 
ment F k — F k+1 . 

From this presentation, it is clear that the RST-BA algo- 
rithm can be implemented by a simple reweighting of the 
data structures already present in L 2 -BA, and so RST-BA 
takes about the same time per iteration as L 2 -BA. The algo- 
rithm terminates when all the components of VF are below 
10 -6 . In practice, a hard iteration limit is set, since the 
problems are large and it is rarely necessary to solve them 
exactly. We followed this approach in testing and simula- 
tion. 

5. Numerical Simulations 

The RST-BA code used for the simulated and real tests 
is currently implemented as part of Nasa VisionWork- 
bench [ ]. Since our target application is the reconstruc- 
tion of the lunar surface from Apollo orbital imager data, 
our synthetic data was modeled in a similar context. Camera 
positions were generated in a sequence incremented along 



the camera x-axis, with the z-axis of the camera coordi- 
nate system defined to point toward the lunar surface. The 
x-increment was calculated to yield the desired overlap be- 
tween camera fields of view, to guarantee that each point on 
the surface was seen by at least two cameras. Given spec- 
ifications for the camera elevation and location, a synthetic 
surface region was calculated, bounded by the camera field 
of view in the ^/-direction, the combined fields of view of the 
second through the penultimate cameras in the ^-direction, 
and an estimate of minimum and maximum lunar surface 
height in the z-direction. 3-dimensional world points were 
then randomly generated within the volume bounded by this 
surface. Finally, each generated 3-dimensional point was 
projected into the image plane of each camera in which it 
was visible, giving a set of image coordinates for each point 
and camera pair. 

After generation of the synthetic 3D world points, we 
added several kinds of noise to the "observations" made 
in the simulated system. In the lunar surface reconstruc- 
tion context, observations include the image coordinates of 
each visible point, and extrinsic camera parameters that are 
known up with some precision from the Apollo mission 
telemetry. Our data generator perturbs image coordinates, 
camera positions, and camera pose according to nominal 
Gaussian distributions with specified variance in order to 
simulate measurement noise and camera uncertainty. To 
test the robustness of our algorithms against mistakes in the 
data, we also introduced outliers in the simulated errors ac- 
cording to error schemes we describe below. 

1 . Nominal conditions: The reprojection errors were gen- 
erated using the normal distribution ~ N(0, 1). 

2. Contaminated normal: The reprojection errors were 
generated using a mixture of two normals, i.e., 

e^~(l-p)N(O,O.25)+pN(O,0) 

for values of p G {0.05,0.1} and values of G 
{4,10,50}. 

3. Student's t-distribution: The reprojection errors were 
generated using Student's t-distribution with df = 4. 

For each run of each experiment, we ran the L2-BA algo- 
rithm as the baseline, along with L2-BA combined with a 
2cr-edit rule (removing 'outliers' that were two standards of 
deviation away from the mean and refitting), and the RST- 
BA algorithm. All degrees of freedom parameters for RST- 
BA were set at 4 for all of the experiments. 

The results for our simulated fitting are presented in Ta- 
ble 1. Each experiment was performed 1000 times, and 
we provide the relative median Mean Squared Error (MSE) 
value and standard deviation for the difference between 
'ground truth' and the final estimates of the algorithms, 



for the 3D world points data and for the camera location 
(x,y,z) data. We left the camera pose parameters fixed at 
their true values during the experiment, by placing a very 
strong prior on them. The relative MSE for the world points 
is defined by 

^£ll^-*llij/(MSE ) (9) 

where N is the total number of 3D world points, Xk is the 
fc-th 'true' world point, Xk is the estimate, and MSE is 
the MSE of the baseline BA method in nominal conditions. 
The relative MSE measure for camera coordinates is simi- 
larly defined. 

The L 2 -BA method with the 2a-edit rule works as well 
or better than L2-BA alone. When the variance of the out- 
liers is very large, the 2cr-edit rule cuts the relative error 
nearly in half, for both world points and camera parameters. 
The RST-BA algorithm works about as well as the 2cr-edit 
rule for cases with small outliers, but as the variance of the 
outliers grows, RST-BA cuts the relative error by a factor 
of 30 - an order of magnitude improvement over the 2a- 
edit rule. When the errors are actually generated from the 
Student's t-distribution, RST-BA cuts the camera error by 
a factor of 6 relative to the standard, and achieves a small 
improvement for the world points. 

6. Bundle Adjustment in Orbital Imagery 

To check the performance of the RST-BA algorithm on 
real data, we used imagery captured by the Apollo Met- 
ric Camera (AMC) on board the NASA Apollo 15 orbiter. 
The AMC is a frame camera with a 74 degree field of view 
that captured snapshots of the Moon's surface at regular in- 
tervals. This resulted in overlap between images of 80%. 
The Apollo-era satellite tracking network was highly in- 
accurate by today's standards, with errors estimated to be 
2.04-km for satellite station positions and 0.002 degrees for 
pose estimates. This creates a need for refinement via bun- 
dle adjustment in order to create consistent 3D models be- 
tween stereo pairs (see Figure 2). The specific frames pro- 
cessed were AS15-M-1089 through AS15-M-1159, which 
were part of Apollo 15 's 33rd orbital revolution [14]. 

In order to test the effectiveness of L2-BA against RST- 
BA, two datasets of image measurements were created: 

1 . Processed Apollo tie -point data was created with ex- 
tensive processing and cleaning. First, tie points were 
automatically detected with the SURF [ ] algorithm. 
Then outliers were removed using the RANSAC algo- 
rithm [ ]. Finally the tie points were thinned down 
to 500 matches between pairs by removing the weak- 
est matches while ensuring that the tie points remained 
evenly distributed across each image. 




(d) 

Figure 2. Surface reconstruction from Orbit 33 images. From top 
to bottom: (a) L2-BA, processed data set; (b) RST-BA, processed 
data set; (c) L2-BA, unprocessed data set; (d) RST-BA, unpro- 
cessed data set. Red indicates high elevation, blue indicates low 
elevation. Black indicates elevations out of the range of the color 
map. Ground control points were not used in these experiments. 



2. Unprocessed Apollo tie-point data was created using 
the Interest Point detection algorithm based on SIFT 
[ ]. No outlier rejection was done in this case, yield- 



ing a data set with up to 50% outliers. 

L 2 -BA and RST-BA were tested with both processed and 
unprocessed data sets. Results of these tests are shown in 
Table 2. Here, triangulation error is a measure of the aver- 
age distance between the closest point of intersection of two 
forward projected rays for a set of tie-points. A decrease in 
triangulation error indicates a substantial improvement in 
the self-consistency of the DEMs in the data set. 

After bundle adjustment was complete, we processed the 
imagery using stereo reconstruction tools [24] to produce 
dense topography of the lunar surface. This 3D reconstruc- 
tion used the improved camera extrinsic parameters from 
bundle adjustment to produce a more consistent, seamless 
mosaic of 3D topographic models. Figure 2 shows these 
results with various bundle adjusment tests. Topography is 
represented by a color-map with red indicating high eleva- 
tion and blue low elevation. 

The original (unadjusted) camera parameters show clear 
discontinuities between adjacent models that are due to the 
uncertainties in the original Apollo tracking data. While 
Processed data yields reasonable results when ground con- 
trol points are used, we did not use these points to empha- 
size that RST-BA can be used in their absence while L 2 - 
BA cannot. Without ground control points, L2-BA found a 
'kink' in the DEM, which is responsible for the black sec- 
tions visible in Figure 2 while RST-BA was able to recon- 
struct the DEM. 

The results from the unprocessed data set, which con- 
tained nearly 50% outliers, show a stark difference between 
the two approaches. The L2-BA algorithm failed to cre- 
ate any improvement. Instead, the outliers caused severe 
and unpredictable distorition of the camera parameters. The 
RST-BA algorithm, on the other hand, was nearly unaf- 
fected, and produced results remarkably similar to those 
produced from the processed data set. Table 2 shows that 
median triangulation error were only slightly higher for the 
unprocessed data than they were for the processed data. 
This data suggests that RST-BA is significantly more robust 
to outliers than the standard bundle adjustment technique. 

7. Conclusion 

We have proposed RST-BA, a robust bundle adjustment 
algorithm, based on the Student's t distribution, for per- 
forming bundle adjustment in the presence of outliers in 
tie-point matching. RST-BA preserves the sparse structure, 
and hence the speed, of L2-BA, and can be implemented by 
simple modifications to the L2-BA algorithm. Our test re- 
sults on both synthetic and real data show that when the data 
have been preprocessed to remove outliers in the tie-point 
matches, RST-BA outperforms L 2 -BA by a small margin, 
and on unpreprocessed data with numerous outliers, RST- 
BA outperforms L2-BA by a significant margin. RST-BA 



demonstrates significant advantages in both speed and ac- 
curacy of results over both L2-BA and L2-BA with a 2a- 
edit rule, and can be used to reconstruct DEMs without data 
preprocessing and without ground control points. In future 
work, we will perform an extended comparison of RST-BA 
with "robust" methods such as Cauchy re-weighting in ad- 
dition to the 2<j-edit rule. We will also work on the problem 
of estimating the degrees of freedom parameters, which are 
currently assumed to be known by the RST-BA algorithm. 
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Table 2. Triangulation errors for the Apollo DEM reconstruction, using processed and unprocessed Apollo tie -point data. RST-BA performs 
much better than L2-BA for unprocessed data, and better than L2-BA for processed data. 
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